home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Technotools
/
Technotools (Chestnut CD-ROM)(1993).ISO
/
unix
/
minix
/
minix.txt
< prev
next >
Wrap
Text File
|
1979-12-31
|
19KB
|
462 lines
MINIX: A CHEAP UNIX CLONE WITH SOURCE CODE
Andrew S. Tanenbaum
Dept. of Mathematics and Computer Science
Vrije Universiteit
Amsterdam, The Netherlands
1. OVERVIEW OF THE MINIX SYSTEM ARCHITECTURE
UNIX- is organized as a single executable program that
is loaded into memory at system boot time and then run.
MINIX is structured in a much more modular way, as a collec-
tion of processes that communicate with each other and with
user processes by sending and receiving messages. There are
separate processes for the memory manager, the file system,
for each device driver, and for certain other system func-
tions. This structure enforces a better interface between
the pieces. The file system cannot, for example, acciden-
tally change the memory manager's tables because the file
system and memory manager each have their own private
address spaces.
These system processes are each full-fledged processes,
with their own memory allocation, process table entry and
state. They can be run, blocked, and send messages, just as
the user processes. In fact, the memory manager and file
system each run in user space as ordinary processes. The
device drivers are all linked together with the kernel into
the same binary program, but they communicate with each
other and with the other processes by message passing.
When the system is compiled, four binary programs are
independently created: the kernel (including the driver
processes), the memory manager, the file system, and init
(which reads /etc/ttys and forks off the login processes).
In other words, compiling the system results in four dis-
tinct a.out files. When the system is booted, all four of
these are read into memory from the boot diskette.
It is possible, and in fact, normal, to modify, recom-
pile, and relink, say, the file system, without having to
relink the other three pieces. This design provides a high
degree of modularity by dividing the system up into indepen-
dent pieces, each with a well-defined function and interface
to the other pieces. The pieces communicate by sending and
receiving messages.
The various processes are structured in four layers:
4. The user processes (top layer).
3. The server processes (memory manager and file system).
_________________________
- UNIX is a trademark of AT&T Bell Laboratories.
2. The device drivers, one process per device.
1. Process and message handling (bottom layer).
Let us now briefly summarize the function of each layer.
Layer 1 is concerned with doing process management
including CPU scheduling and interprocess communication.
When a process does a SEND or RECEIVE, it traps to the ker-
nel, which then tries to execute the command. If the com-
mand cannot be executed (e.g., a process does a RECEIVE and
there are no messages waiting for it), the caller is blocked
until the command can be executed, at which time the process
is reactivated. When an interrupt occurs, layer 1 converts
it into a message to the appropriate device driver, which
will normally be blocked waiting for it. The decision about
which process to run when is also made in layer 1. A prior-
ity algorithm is used, giving device drivers higher priority
over ordinary user processes, for example.
Layer 2 contains the device drivers, one process per
major device. These processes are part of the kernel's
address space because they must run in kernel mode to access
I/O device registers and execute I/O instructions. Although
the IBM PC does not have user mode/kernel mode, most other
machines do, so this decision has been made with an eye
toward the future. To distinguish the processes within the
kernel from those in user space, the kernel processes are
called tasks.
Layer 3 contains only two processes, the memory manager
and the file system. They are both structured as servers,
with the user processes as clients. When a user process
(i.e., a client) wants to execute a system call, it calls,
for example, the library procedure read with the file
descriptor, buffer, and count. The library procedure builds
a message containing the system call number and the parame-
ters and sends it to the file system. The client then
blocks waiting for a reply. When the file system receives
the message, it carries it out and sends back a reply con-
taining the number of bytes read or the error code. The
library procedure gets the reply and returns the result to
the caller in the usual way. The user is completely unaware
of what is going on here, making it easy to replace the
local file system with a remote one.
Layer 4 contains the user programs. When the system
comes up, init forks off login processes, which then wait
for input. On a successful login, the shell is executed.
Processes can fork, resulting in a tree of processes, with
init at the root. When CTRL-D is typed to a shell, it
exits, and init replaces the shell with another login pro-
cess.
2. LAYER 1 - PROCESSES AND MESSAGES
The two basic concepts on which MINIX is built are
processes and messages. A process is an independently
schedulable entity with its own process table entry. A mes-
sage is a structure containing the sender's process number,
- 2 -
a message type field, and a variable part (a union) contain-
ing the parameters or reply codes of the message. Message
size is fixed, depending on how big the union happens to be
on the machine in question. On the IBM PC it is 24 bytes.
Three kernel calls are provided:
- RECEIVE(source, &message)
- SEND(destination, &message)
- SENDREC(process, &message)
These are the only true system calls (i.e., traps to the
kernel). RECEIVE announces the willingness of the caller to
accept a message from a specified process, or ANY, if the
RECEIVER will accept any message. (From here on, ``pro-
cess'' also includes the tasks.) If no message is available,
the receiving process is blocked. SEND attempts to transmit
a message to the destination process. If the destination
process is currently blocked trying to receive from the
sender, the kernel copies the message from the sender's
buffer to the receiver's buffer, and then marks them both as
runnable. If the receiver is not waiting for a message from
the sender, the sender is blocked.
The SENDREC primitive combines the functions of the
other two. It sends a message to the indicated process, and
then blocks until a reply has been received. The reply
overwrites the original message. User processes use SENDREC
to execute system calls by sending messages to the servers
and then blocking until the reply arrives.
There are two ways to enter the kernel. One way is by
the trap resulting from a process' attempt to send or
receive a message. The other way is by an interrupt. When
an interrupt occurs, the registers and machine state of the
currently running process are saved in its process table
entry. Then a general interrupt handler is called with the
interrupt number as parameter. This procedure builds a mes-
sage of type INTERRUPT, copies it to the buffer of the wait-
ing task, marks that task as runnable (unblocked), and then
calls the scheduler to see who to run next.
The scheduler maintains three queues, corresponding to
layers 2, 3, and 4, respectively. The driver queue has the
highest priority, the server queue has middle priority, and
the user queue has lowest priority. The scheduling algo-
rithm is simple: find the highest priority queue that has at
least one process on it, and run the first process on that
queue. In this way, a clock interrupt will cause a process
switch if the file system was running, but not if the disk
driver was running. If the disk driver was running, the
clock task will be put at the end of the highest priority
queue, and run when its turn comes.
In addition to this rule, once every 100 msec, the
clock task checks to see if the current process is a user
process that has been running for at least 100 msec. If so,
that user is removed from the front of the user queue and
put on the back. In effect, compute bound user processes
- 3 -
are run using a round robin scheduler. Once started, a user
process runs until either it blocks trying to send or
receive a message, or it has had 100 msec of CPU time. This
algorithm is simple, fair, and easy to implement.
3. LAYER 2 - DEVICE DRIVERS
Like all versions of UNIX for the IBM PC, MINIX does
not use the ROM BIOS for input or output because the BIOS
does not support interrupts. Suppose a user forks off a
compilation in the background and then calls the editor. If
the editor tried to read from the terminal using the BIOS,
the compilation (and any other background jobs such as
printing) would be stopped dead in their tracks waiting for
the the next character to be typed. Such behavior may be
acceptable in the MS-DOS world, but it certainly is not in
the UNIX world. As a result, MINIX contains a complete set
of drivers that duplicate the functions of the BIOS. Like
the rest of MINIX, these drivers are written in C, not
assembly language.
This design has important implications for running
MINIX on PC clones. A clone whose hardware is not compati-
ble with the PC down to the chip level, but which tries to
hide the differences by making the BIOS calls functionally
identical to IBM's will not run an unmodified MINIX because
MINIX does not use the BIOS.
Each device driver is a separate process in MINIX. At
present, the drivers include the clock driver, terminal
driver, various disk drivers (e.g., RAM disk, floppy disk),
and printer driver. Each driver has a main loop consisting
of three actions:
1. Wait for an incoming message.
2. Perform the request contained in the message.
3. Send a reply message.
Request messages have a standard format, containing the
opcode (e.g., READ, WRITE, or IOCTL), the minor device
number, the position (e.g., disk block number), the buffer
address, the byte count, and the number of the process on
whose behalf the work is being done.
As an example of where device drivers fit in, consider
what happens when a user wants to read from a file. The
user sends a message to the file system. If the file system
has the needed data in its buffer cache, they are copied
back to the user. Otherwise, the file system sends a mes-
sage to the disk task requesting that the block be read into
a buffer within the file system's address space (in its
cache). Users may not send messages to the tasks directly.
Only the servers may do this.
MINIX supports a RAM disk. In fact, the RAM disk is
always used to hold the root device. When the system is
booted, after the operating system has been loaded, the user
is instructed to insert the root file system diskette. The
file system then sees how big it is, allocates the necessary
- 4 -
memory, and copies the diskette to the RAM disk. Other file
systems can then be mounted on the root device.
This organization puts important directories such as
/bin and /tmp on the fastest device, and also makes it easy
to work with either floppy disks or hard disks or a mixture
of the two by mounting them on /usr or /user or elsewhere.
In any event, the root device is always in the same place.
In the standard distribution, the RAM disk is about
240K, most of which is full of parts of the C compiler. In
the 256K system, a much smaller RAM disk has to be used,
which explains why this version has no C compiler: there is
no place to put it. (The /usr diskette is completely full
with the other utility programs and one of the design goals
was to make the system run on a 256K PC with 1 floppy disk.)
Users with an unusual configuration such as 256K and three
hard disks are free to juggle things around as they see fit.
The terminal driver is compatible with the standard V7
terminal driver. It supports cooked mode, raw mode, and
cbreak mode. It also supports several escape sequences,
such as cursor positioning and reverse scrolling because the
screen editor needs them.
The printer driver copies its input to the printer
character for character without modification. It does not
even convert line feed to carriage return + line feed. This
makes it possible to send escape sequences to graphics
printers without the driver messing things up. MINIX does
not spool output because floppy disk systems rarely have
enough spare disk space for the spooling directory. Instead
one normally would print a file f by saying
lpr <f &
to do the printing in the background. The lpr program
insert carriage returns, expands tabs, and so on, so it
should only be used for straight ASCII files. On hard disk
systems, a spooler would not be difficult to write.
4. LAYER 3 - SERVERS
Layer 3 contains two server processes: the memory
manager and the file system. They are both structured in
the same way as the device drivers, that is a main loop that
accepts requests, performs them, and then replies. We will
now look at each of these in turn.
The memory manager's job is to handle those system
calls that affect memory allocation, as well as a few oth-
ers. These include FORK, EXEC, WAIT, KILL, and BRK. The
memory model used by MINIX is exceptionally simple in order
to accommodate computers without any memory management
hardware. When the shell forks off a process, a copy of the
shell is made in memory. When the child does an EXEC, the
new core image is placed in memory. Thereafter it is never
moved. MINIX does not swap or page.
The amount of memory allocated to the process is deter-
mined by a field in the header of the executable file. A
- 5 -
program, chmem, has been provided to manipulate this field.
When a process is started, the text segment is set at the
very bottom of the allocated memory area, followed by the
data and bss. The stack starts at the top of the allocated
memory and grows downward. The space between the bottom of
the stack and the top of the data segment is available for
both segments to grow into as needed. If the two segments
meet, the process is killed.
In the past, before paging was invented, all memory
allocation schemes worked like this. In the future, when
even small microcomputers will use 32-bit CPUs and 1M x 1
bit memory chips, the minimum feasible memory will be 4
megabytes and this allocation scheme will probably become
popular again due to its inherent simplicity. Thus the
MINIX scheme can be regarded as either hopelessly outdated
or amazingly futuristic, as you prefer.
The memory manager keeps track of memory using a list
of holes. When new memory is needed, either for FORK or for
EXEC, it searches the hole list and takes the first hole
that is big enough (first fit). When a process terminates,
if it is adjacent to a hole on either side, the process'
memory and the hole are merged into a bigger hole.
The file system is really a remote file server that
happens to be running on the user's machine. However it is
straightforward to convert it into a true network file
server. All that needs to be done is change the message
interface and provide some way of authenticating requests.
(In MINIX, the source field in the incoming message is
trustworthy because it is filled in by the kernel.) When
running remote, the MINIX file server maintains state infor-
mation, like RFS and unlike NFS.
The MINIX file system is similar to that of V7 UNIX.
The i-node is slightly different, containing only 9 disk
addresses instead of 13, and only 1 time instead of 3.
These changes reduce the i-node from 64 bytes to 32 bytes,
to store more i-nodes per disk block and reduce the size of
the in-core i-node table.
Free disk blocks and free inodes are kept track of
using bit maps rather than free lists. The bit maps for the
root device and all mounted file systems are kept in memory.
When a file grows, the system makes a definite effort to
allocate the new block as close as possible to the old ones,
to minimize arm motion. Disk storage is not necessarily
allocated one block at a time. A minor device can be con-
figured to allocate 2, 4 (or more) contiguous blocks when-
ever a block is allocated. Although this wastes disk space,
these multiblock zones improve disk performance by keeping
file blocks close together. The standard parameters for
MINIX as distributed are 1K blocks and 1K zones (i.e., just
1 block per zone).
MINIX maintains a buffer cache of recently used blocks.
A hashing algorithm is used to look up blocks in the cache.
When an i-node block, directory block, or other critical
block is modified, it is written back to disk immediately.
- 6 -
Data blocks are only written back at the next SYNC or when
the buffer is needed for something else.
The MINIX directory system and format is identical to
that of V7 UNIX. File names are strings of up to 14 charac-
ters, and directories can be arbitrarily long.
5. LAYER 4 - USER PROCESSES
This layer contains init, the shell, the editor, the
compiler, the utilities, and all the user processes. These
processes may only send messages to the memory manager and
the file system, and these servers only accept valid system
call requests. Thus the user processes do not perceive
MINIX to be a general-purpose message passing system. How-
ever, removing the one line of code that checks if the mes-
sage destination is valid would convert it into a much more
general system (but less UNIX-like).